Speech Emotion Recognition Based on Power Normalized Cepstral Coefficients in Noisy Conditions

Authors

  • Bashirpour, M. Faculty of Electrical and Computer Engineering, University of Tabriz, Tabriz, Iran.
  • Geravanchizadeh, M. Faculty of Electrical and Computer Engineering, University of Tabriz, Tabriz, Iran.
Abstract:

Automatic recognition of speech emotional states in noisy conditions has become an important research topic in the emotional speech recognition area, in recent years. This paper considers the recognition of emotional states via speech in real environments. For this task, we employ the power normalized cepstral coefficients (PNCC) in a speech emotion recognition system. We investigate its performance in emotion recognition using clean and noisy speech materials and compare it with the performances of the well-known MFCC, LPCC, RASTA-PLP, and also TEMFCC features. Speech samples are extracted from the Berlin emotional speech database (Emo DB) and Persian emotional speech database (Persian ESD) which are corrupted with 4 different noise types under various SNR levels. The experiments are conducted in clean train/noisy test scenarios to simulate practical conditions with noise sources. Simulation results show that higher recognition rates are achieved for PNCC as compared with the conventional features under noisy conditions.

Upgrade to premium to download articles

Sign up to access the full text

Already have an account?login

similar resources

Perceptual harmonic cepstral coefficients for speech recognition in noisy environment

Perceptual harmonic cepstral coefficients (PHCC) are proposed as features to extract from speech for recognition in noisy environments. A weighting function, which depends on the prominence of the harmonic structure, is applied to the power spectrum to ensure accurate representation of the voiced speech spectral envelope. The harmonics weighted power spectrum undergoes mel-scaled band-pass filt...

full text

Speech Emotion Recognition Based on Deep Belief Networks and Wavelet Packet Cepstral Coefficients

A wavelet packet based adaptive filter-bank construction combined with Deep Belief Network(DBN) feature learning method is proposed for speech signal processing in this paper. On this basis, a set of acoustic features are extracted for speech emotion recognition, namely Coiflet Wavelet Packet Cepstral Coefficients (CWPCC). CWPCC extends the conventional MelFrequency Cepstral Coefficients (MFCC)...

full text

On compensating the Mel-frequency cepstral coefficients for noisy speech recognition

This paper describes a novel noise-robust automatic speech recognition (ASR) front-end that employs a combination of Mel-filterbank output compensation and cumulative distribution mapping of cepstral coefficients with truncated Gaussian distribution. Recognition experiments on the Aurora II connected digits database reveal that the proposed front-end achieves an average digit recognition accura...

full text

Perceptual MVDR-based cepstral coefficients (PMCCs) for robust speech recognition

This paper describes a robust feature extraction technique for continuous speech recognition. Central to the technique is the Minimum Variance Distortionless Response (MVDR) method of spectrum estimation. We incorporate perceptual information directly in to the spectrum estimation. This provides improved robustness and computational efficiency when compared with the previously proposed MVDR-MFC...

full text

Recognition of noisy speech using normalized moments

Spectral subband centroid, which is esse ntially the first -order normalized moment, has been proposed for speech recognition and its robustness to additive noise has been demonstrated before. In this paper, we extend this concept to the use of normalized spectral subband moments (NSSM) for robust speech recognition. We show that normalized moments, if properly selected, yield comparable recogn...

full text

Acoustic Emotion Recognition Using Linear and Nonlinear Cepstral Coefficients

Recognizing human emotions through vocal channel has gained increased attention recently. In this paper, we study how used features, and classifiers impact recognition accuracy of emotions present in speech. Four emotional states are considered for classification of emotions from speech in this work. For this aim, features are extracted from audio characteristics of emotional speech using Linea...

full text

My Resources

Save resource for easier access later

Save to my library Already added to my library

{@ msg_add @}


Journal title

volume 12  issue 3

pages  197- 205

publication date 2016-09

By following a journal you will be notified via email when a new issue of this journal is published.

Keywords

No Keywords

Hosted on Doprax cloud platform doprax.com

copyright © 2015-2023